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Communities are clusters of nodes with a higher than average density of internal connections. 
Their detection is of great relevance to better understand the structure and hierarchies present in a 
network. Modularity has become a standard tool in the area of community detection, providing at 
the same time a way to evaluate partitions and, by maximizing it, a method to find communities. 
In this work, we study the modularity from a combinatorial point of view. Our analysis (as the 
modularity definition) relies on the use of the configurational model, a technique that given a 
graph produces a series of randomized copies keeping the degree sequence invariant. We develop an 
approach that enumerates the null model partitions and can be used to calculate the probability 
distribution function of the modularity. Our theory allows for a deep inquiry of several interesting 
features characterizing modularity such as its resolution limit and the statistics of the partitions 
that maximize it. Additionally, the study of the probability of extremes of the modularity in the 
random graph partitions opens the way for a definition of the statistical significance of network 
partitions. 
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I. INTRODUCTION 

Graphs are used as mathematical representations of 
complex systems. Examples can be found in biology, 
technology, social and information sciences [T]-[3]- Real 
world networks show several non trivial topological fea- 
tures, among which one of the most fascinating is the 
organization of their nodes in local clusters or modules 
known as communities. Communities are groups of nodes 
with a high level of internal and low level of external con- 
nectivity. They are subgraphs relatively isolated from 
the rest of the network and are expected to correspond 
to groups of elements sharing common features and/or 
playing similar roles within the original system. The 
last few years have witnessed an increasing interest in 
defining and identifying communities j4HT2] (see [13] for 
a recent review). Different methods have been proposed 
from topological considerations [6\ H 12 to the study of 
the influence that communities have in the properties of 
dynamical processes running on the network such as ran- 
dom walks diffusion O |12] or the Potts model . 

A major role in this context is played by the modu- 
larity function Q introduced by Newman and Girvan [6^ . 
The modularity is a quality measure aimed at quantify- 
ing the relevance of the community structure in a network 
partition. It is defined as 

1 ^ 

Qc = j^Yl (^^''^ ~ (e^,^)) ' (1) 

where M is the total number of links in the network, the 
sum runs over the C communities of the partition, e^p^^p 
stands for the number of internal links in the community 
(/p, and (e(^,(^) is the expected value of this quantity in a 
random null model (typically, the configurational model). 
The modularity corresponds thus to the comparison be- 
tween the actual number of internal links of the mod- 



ules and the number they would have in a random null 
model. The partition with maximal Q is then considered 
the best and most significant division of the network in 
communities [6]. The search for such optimal partition 
is in general a great challenge since it was proved to be a 
NP-complete hard problem ^4j. Many heuristics relying 
on different approaches have been introduced to approx- 
imate the optimal partition: Some based on cluster hi- 
erarchical division or aggregation methods [6l [T5H22] , on 
simulated annealing [lOl [23], spectral methods [24H26] . 
genetic algorithms [2T or extremal optimization [28^ to 
mention a few. Still modularity maximization as a proce- 
dure for community detection is not free from shadows. 
It was shown that the modularity suffers from resolu- 
tion limits [29l |30] , not being able to discern the quality 
of modules smaller than a certain size (V^)- Also op- 
timized partitions even in random graphs have non zero 
modularity [31] , posing the question of the significance of 
a partition. And, finally, the huge number of degenerate 
local maxima of Q in common examples can practically 
prevent the finding of the real optimal partition [32 . 

In this paper, we choose a different route to study 
the modularity function, trying to shed some additional 
light on its limits and intrinsic properties. We develop 
a combinatorial method to estimate the distribution of 
modularity values in the partitions of the configurational 
model [33H35] . Our approach leads us to write explicit 
formulas for the modularity distribution and to analyze 
in details the characteristics of this function. We focus 
our attention on the resolution limit of modularity [29] 
showing that, even in the case of random networks, mod- 
ularity prefers to merge small groups into larger ones. We 
also focus on the evaluation of the statistical significance 
of communities, basing our estimates on the probabil- 
ity associated to modularity in the configurational model 
and extending previous results on the topic 0TJ [36] . 

The paper is organized as follows. In section [n| we 
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introduce the configurational model (i.e., the null model 
of modularity) and propose a combinatorial approach for 
the study of its networks' partitions. In particular, sub- 
section |II A| is devoted to the description of the model, 
while subsection II B deals with the theory of commu- 
nities in the configurational model. In subsection II C[ 
we show how to estimate the number of internal connec- 
tions of a community. From section |III[ we start with 
the analysis of the modularity function in the configura- 
tional model. We show exact expressions for the proba- 
bility distribution function of modularity and analyze its 
main features. In section [IV| we focus on the statistics of 
the maximal modularity in the configurational model and 
propose a simple, but efficient way for the determination 
of the statistical significance of partitions in networks. 
In section |V| we extend our whole theory to the case 
of directed and bipartite networks. We draw our final 
comments and considerations in section IVTl 



ory represents therefore a combinatorial approach to the 
configurational model with explicit application to modu- 
larity. 



A. The configurational model 

The basic ingredients of the configurational model 
are the number of nodes and the degree sequence of 
the network nodes. Consider therefore a network com- 
posed of N nodes and denote the degree of the j-th 
node by kj. The full degree sequence is then the set 
{ki} = {/ci, /c2, • • • , ^at}. The procedure to construct the 
networks is very simple: each node j is connected to other 
kj randomly chosen nodes but always satisfying the con- 
straints imposed by keeping the entire degree sequence 
constant. We consider first the case of undirected net- 
works. For this class of networks, the sum of all degrees 
should be an even number and we can thus write 



II. STATISTICAL MODEL 

The configurational model is a prototypical algorithm 
for the generation of uncorrelated networks with pre- 
scribed number of nodes and of node connections (de- 
gree). The procedure for the random networks construc- 
tion was originally introduced by Molloy and Reed in 
Ref. [33 . This model has been the subject of many re- 
search papers along the last decade. Typical properties 
observed in real networks are generally tested against the 
model graphs in order to asses whether they are effec- 
tively genuine or just induced by the constraints to which 
the network is subjected as keeping a degree sequence 
invariant. Examples range from the simple determina- 
tion of degree-degree correlations [37] to clustering [38j. 
Community structure, which can be seen as a correla- 
tion between connections at a local level, is (must be) 
also tested against a null model. The modularity func- 
tion, which has become the standard tool in community 
detection, is defined using the configurational model as 
null model [6 . Modularity in fact compares the num- 
ber of connections between nodes of the same module 
with the one expected on average in the configurational 
model, i.e., for random networks with the same set of 
vertices and node degrees as the given graph. Before go- 
ing further, it is worth stressing that other, more or less 
restrictive, null models can be also employed in defining 
the modularity function [39H4T] . We chose the configura- 
tional model as a paradigmatic example for our analysis 
essentially due to its simplicity, to the fact that it was 
the original null model in the definition of Q and that it 
keeps being the most extensively used. 

In the next subsections, we study in details the configu- 
rational model. We propose a combinatorial approach for 
the enumeration of all possible network partitions belong- 
ing to the ensemble generated by the model and formu- 
late exact expressions for the probability of the number 
of internal connections of their modules. The whole the- 
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2M 
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The generation mechanism of the configurational model 
can be formulated in an alternative manner (see Fig- 
ure [T]): {i) Randomly fill a list composed of 2M entries 
with node labels ranging from 1 to where the number 
of appearances of each label is equal to the respective 
node degree; (ii) Draw a connection between each pair 
of nodes whose labels appear at positions P2k-i and p2k 
for each k = 1,2, ...,M. It is clear that, in the case 
of this construction procedure, multiple connections and 
self-loops are not avoided. Their presence however can 
be considered negligible under certain realistic assump- 




(a) 



(b) 



Figure 1: (Color online) A simple network generated accord- 
ing to the configurational model. The network is composed 
of = 6 nodes and M = 6 edges. The degree sequence is 
{A;^} = {ki = 3,/c2 = 4,/c3 = 2,A;4 = l,/c5 = l,/c6 = 1}. 
(a) A sequence of node labels is generated and (b) according 
to it connections are drawn in the network. If node labels 
are replaced by community labels (in this case, ai = as = 
V,cr2 — 0-4 — QjCTs — (76 — □), the network in (b) can be 
seen as a graph between C = 3 communities with degree se- 
quence {da} = {d^ — 4, c/q = 5,(in =3}. In this particular 
case, the measured values of intra- and inter- community con- 
nections are: {eo,,a, e^is} — {ev,v — I^^CO — 0,en,n — 
O'^v,© 2,ev,n O,eo,n 3}. 
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tions [35^, in simple words that no node concentrates a 
significant fraction of the network connections [42 . 

The construction procedure just introduced is the most 
common technique to build the graphs of the configura- 
tional model. Note that it samples homogeneously out of 
the set of all possible sequences of node labels, not out of 
the set of all possible graphs with given degree sequence. 
The reason is that the same graph may be represented 
by different sequences of node labels and its multiplicity 
may vary as a function of several factors (i.e., number 
of self-loops, multiple connections, etc.). The total num- 
ber of possible sequences of node labels with prescribed 
degree sequence {ki} is simply given by 



2M 



(2M)\ 



ki\k2\ ■■■kNl 



(3) 



The term on the right of Eq. ([3| is a multinomial coeffi- 
cient and counts the total number of ways of organizing 
N node labels with multiplicities {ki} subjected to the 
constraint of Eq. ([2| . 



B. Communities in the configurational model 

Consider now a partition of the network in C groups or 
communities. For partition we mean a division of nodes 
in several non overlapping node groups. The degree d^^p 
of the group cp (where cp can be 1, . . . , C) is given by the 
sum of the degrees of all nodes belonging to it 



(4) 



The network between communities in the configurational 
model is equivalent to a configurational model composed 
of C "super nodes", one per group, with degree sequence 
{da} — {<^i, <^2, . . . , <ic} (see Figure [T]). Similarly to the 
argument leading to Eq. (|3|, also in this case the total 
number of sequences of communities labels can be written 



as 



rc(K}) = 



2M 



(2M)! 



di, (^2, ... Ac) di\d2l • • • del 



■ (5) 



If we refer as e^^o to the number of edges present be- 
tween the (p-th and the ^-th community since the network 
is undirected we have for symmetry that e^p^e = cq^^^ 
for any ip and 9. The links intra-community are com- 
pleted by the internal group links, denoted as e^p^^p for 
each group (p. By definition, these quantities should obey 
the C relations 



9=1 



(6) 



because the degree of the (p-ih. community is equal 
to the sum of all edges having only one end in ip 
plus twice the number of edges having both ends 



in the group. Fixed a particular set of values for 
intra- and inter-community edges, namely {ea^a^^a^/s} = 
{ei,i, 62,2, . . . , ec,c, ei,2, ei,3, . . . , ei,c, . . . , ec-i,c}, the 
total number of sequences of community labels that sat- 
isfy these requirements are 



{{ea,a,ea,(3}) 



(7) 



C C-1 c 

Ml Y[ — ^2^?=i ^^='^+1'^'^'^ n n — 



ip)=l 6'=(^+l 



Eq. ^ states that the number of sequences of commu- 
nity labels, with given intra- and inter-community edges 
{eQ;,Q;, eQ,,/3}, can be obtained as the product of three 
factors: {i) M!, the number of permutations of the M 
edges; (n) [if 



- — r, the inverse of the different num- 
ber of times to list all the intra-community edges; and 
=if+i inverse of the total number 

of ways to arrange all the inter-community edges, where 
in particular the factor 2^'^=^ 2^e=^+i needed due 

to the fact that the presence of an inter-community edge 
is independent of the order in which the community la- 
bels appear on the list (i.e., e^^e = e6»,(^, for any (p and 
0). The probability therefore to observe a particular se- 
quence of label communities with certain set of values 
{eQ;,a, Gq;,/?} is given by the ratio between the quantities 



defined in Eqs. ^ and ([5|, 



Vc {{ea,a,ea,l3}) 



{{ea,a,ea^i3}) 

Tc{{da}) ' 



(8) 



C. Internal connectivity of communities 

In the case of communities, we are not generally 
interested in the whole set {6^,0;, e^,/?} for the intra- 
and inter-community edges, but only in the set of pos- 
sible sequences with given intra-community edge se- 
quence {ca^a}- This basically amounts to calculating 
the marginal distribution of the probability in Eq. (|8| 
by summing over all the possible configurations of the 
inter-community edges {ea^/s} 



Vc{{ea,a}) 



1 



rc(K}) 



'7^c ({ec,a,ea,/3}) , (9) 



{e«,/3} 



where in the sum the inter-community edges {ea,/?} are 
subjected to the constraints of Eqs. ([6|. 

T^c {{ea,a}) is the probability that groups of nodes, 
with degrees specified by {da}, have internal connec- 
tions equal to the sequence {ea,a} in the hypothesis that 
connections have been drawn according to the configu- 
rational model rules. The distribution Vc {{ea,a}) can 
be easily obtained for C = 2 and C = 3. In these cases, 
the inter-community edges {e^/s} are completely deter- 
mined by the constraints of Eqs. (|6| given the number of 
intra-community edges {e^^a}, hence no sum is actually 
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required. For example, for C = 2 Eq. ([9| becomes 
7^2({ei,i})= (10) 
M! di\ {2M-di)\ 2^1-2^1'! 

(2M)! ei,i! (M - + ei,i)! (di - 2ei,i)! ' 

given that from Eqs. ([2| and ([6| we have ei^2 = — 2ei^i 
and 62,2 = M — di-\-ei^i. Notice that V2 ({ei,i}) depends 
only on ei^i, since 62,2 is fixed for any value of ei 1 and 
viceversa. Interestingly, the distribution of Eq. (ITo]) has 
been also found as the solution of a completely different 
problem in survival analysis where is known as the Uni- 
variate Twins Distribution and has applications also to 
the study of the genetic variability of neutral alleles in a 
population [43 . 

For C = 3, the calculations are a little more cumber- 
some but we obtain 



T^3 ({61,1,62,2,63,3}) 
Ml 



(11) 



M-Mint 



(2M)! 



n 



[ e^,^! (M - Mint -d^^ 2e^,^)! 



^^^i ^if.if is the total number of intra- 



where Mint 
community edges. 

The general case (i.e., arbitrary number of groups C) 
includes a sum over all the possible configurations of the 
inter- groups connections. This turns the calculation of 
T^c {{^a,a}) quite hard, in fact we were not able to find 
an analytical closed form for it. This problem is similar 
to those appearing in the enumeration of contingency ta- 
bles (whose most celebrated examples are the latin and 
magic squares) and represents still an open problem in 
combinatorics [44H46 . It is still possible to numerically 
determine the sum with a computational time growing 
as M^ [the number of free indices in the sum of Eq. (l9| 
is C^/2 — 3C/2]. Another possibility is to relax the con- 
straints of Eqs. (|6| considering the groups as independent 
of each other. This "pair approximation" yields 



gree sequence {d^}, the probability distribution function 
that such groups have a set {ca^a} of internal connec- 
tions under the hypothesis that the network is generated 
according to the configurational model algorithm. As 
explained before, the modularity function Qc of a parti- 
tion in C groups with degree sequence {da} and internal 
connectivities {e^^a} is defined as 
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^=1 



(6(^,(^)) 



Mir, 



Vc{{da}) 



M 



(13) 

where Vc {{da}) = Xl(^=i(6(^,(^) represents the sum of the 
expected internal connectivities over all modules and is 
determined by the degree sequence of the modules {da}- 
The average value of the intra-community edges of the 
module (p can be obtained by marginalizing the general 
distribution Vc {{ea,a}) of Eq. ([9| and turns out to be 



(6<^,(^) 



d^ {d^ - 1) 
2(2M- 1) 



(14) 



Notice that this average value is slightly different from 
the one used in the original formulation of the modular- 
ity, i.e., (e^^^) = {d^) / (4M), which is a rougher ap- 
proximation to the value expected in the configurational 
model. The probability of the modularity function to 
have a value Q for the networks of the null model ensem- 
ble can be then calculated as 



Vc{Q) 



E ^c{{ec 



,a}) S[Qc{{ea,a})-Q] • (15) 



Note that the term S[Qc {{ea,a}) — Q] adds to Eqs. (|6| 
the new constraint Mint = M Q -\- Vc{{da})- For in- 
stance, this implies that for C = 2 and C = 3 the distri- 
bution of the modularity in the configurational model can 
be obtained by modifying accordingly Eqs. ( 10 ) and ( 11 ). 



Vc {{ea,a}) ^ Vc {{ea^a}) = \[ V2 ({6^,^}) , (12) 

which stands for the product of C independent bi- 
partitions, each of them weighted by the probability 
7^2 ({6(^,(^}) of Equation (10), where the constraints are 
now simply 2e^^^ < d^^ W(p — 1,...,C Due to 
the reduced calculation burden, this approximation can 
be helpful in some cases in which a fast evaluation of 
Vc ({6Q;,a}) is needed. We expect it to work better when 
the number of communities C is larger. 



III. MODULARITY FUNCTION 



A. Modularity distribution in the configurational 
model 



Up to now we have introduced a formalism which al- 
lows to compute, given C groups of nodes and their de- 



B. Properties of Qc and Vc(Q) 

We illustrate now some characteristics of Qc and its 
distribution Vc{Q) in the null model with a few exam- 
ples simple enough to admit an analytic or semi-analytic 
treatment. The interest in the use of modularity is gen- 
erally focused on the search of the partition with the 
maximum Qc- This search, as has been discussed, is a 
hard problem [14 , mainly due to the huge amount of 
almost degenerate local maxima in the modularity land- 
scape [32 . Such abundance of local maxima has been 
even found when the modularity optimization is applied 
to the random networks generated with the configura- 
tional model. With our formalism we are not able to 
judge whether a partition is a local maximum in Qc land- 
scape, but we can already evidence the problem of the 
abundance of local structure by considering our results 
from a more restricted point of view. For the first of our 
examples, we choose to split the null model networks in 
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Figure 2: (Color online) Fixed the partitions corresponding to top q% of modularity, we compute the average and standard 
deviation, over this ensemble, of the modularity Qs as a function of the relative degrees of the groups [i.e., di/ (2M) and 
c/2/ (2M)]. For q = 100, the average value (not shown) is zero for every value of di/ (2M) and d2/ (2M). On the other end, the 
standard deviation (panel a) tends to be small when the degree of one of the communities is small and grows as the communities 
become similar in their degrees. For ^ = 5 (panel b and c), both average value and standard deviation grows as the partition 
becomes more homogeneous. Here we set M — 100. 



three groups, a case for which we can obtain analytical 
solutions. We compute the average value {{Qs)) and the 
standard deviation {(tq) of Vs {Q) as a function of the 
relative degree of the communities [i.e., di/ (2M) and 
0^2/ (2M)]. These quantities are calculated only over the 
partitions corresponding to the top q% instances of the 
modularity. For q = 100, {Qs) = everywhere, as ex- 
pected since the expected modularity in the null model 
is zero, while the standard deviation exhibits a regular 
behavior. The results for ctq can be seen in the panel (a) 
of Figure [2] Then to approximate the local maxima of 
Qc^ we restrict the calculations to only the top q = 5% 
instances of the modularity distribution. Recall that we 
are doing this analytically so the analysis precision does 
not suffer for concentrating in extreme values. In the 
panel (b) of Figure |2j one can observe how the average 
is not longer null and varies consistently from the re- 
gion of imbalanced partitions [i.e., d^^/ (2M) for one 
the (p group] to the zone of homogeneous partitions [i.e., 
d^/ (2M) ~ 1/3 for all cp]. There is a wide region in which 
large changes of di/ (2M) and ^2/ (2M) do not produce 
important variations in the average value. At the same 
time, it is possible to observe a fine structure pointing 
to a rich local landscape geometry for Qs- This result 
is just indicative since the projection of the partitions 
space in a plane with only two parameters [di/ (2M) and 
0^2/ (2M)] is too gross. See for instance ^ for a more 
systematic method to do such projection. The standard 
deviation of the top 5% modularity instances, (panel c 
of the Figure |2|, continues to be large for homogeneous 
partitions and decreases as the partition becomes more 
imbalanced following similar patterns as {Qs)- 

We consider next another interesting application re- 
lated to the so-called resolution limit of the modularity 
function [29l |30] . We analyze all the possible divisions in 
C = 3 groups [as before monitored as a function of the 
relative degree of two groups, di/ (2M) and ^2/ (2M)) 



and calculate the modularity Qs- Fixed di/ (2M) and 
(^2/ (2M) [and ds/ (2M)], we calculate also Q2 which 
is the modularity of the partition with groups 1 and 2 
merged together. The quantity Qs — Q2 is then mea- 
sured and its average value and standard deviation over 
all partitions corresponding to the top q% values of Qs is 
estimated. Note that if Q2 > Qs according to modularity 
optimization it would be more convenient to merge both 
communities. When all the partitions are considered (i.e., 
q = 100) the average is always zero and the standard 
deviation (see Figure |3^) shows a regular pattern with 
maximum at di/ (2M) =6^2/ (2M) 1/2. When, again 
to approximate the local extrema of the Q distribution, 
only the top 5% of the partitions is considered, the dif- 
ference between Qs and Q2 is not longer zero, but there 
is wide range of values of di/ (2M) and d2/ (2M) for 
which Q2 > Qs (see Figure [sj)). This happens when at 
least one of the merged community is "small", the limit 
of resolution is related to >/M [29 . Modularity opti- 
mization would then tend to aggregate the two groups 
in one under such circumstances regardless of the other 
groups' properties. The standard deviation of Qs — Q2 in 
the top 5% behaves differently from what is observed for 
q = 100. The maximal standard deviation is obtained for 
homogeneous partitions, while it decreases as the parti- 
tion becomes more and more imbalanced as can be seen 
in Figure [Sj). 



IV. STATISTICAL SIGNIFICANCE OF 
PARTITIONS 

The most important application of finding an explicit 
form for the distribution of the modularity values of the 
partitions of the random graphs of a null model, as the 
configurational model, is that the extremes of the distri- 
bution offer comparison points to establish the statisti- 
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Figure 3: (Color online) Fixed the partitions corresponding to the top q% of modularity, we compute the average and standard 
deviation, over this ensemble, of the difference Q3 — (52 as a function of the relative degrees of the groups [i.e., di/ (2M) and 
^2/ (2M)]. For q = 100, the average is always zero, while the standard deviation (panel a) grows as di/ (2M) and d2/ (2M) 
tends to 1/2 (i.e., the third community is empty). For ^ = 5, there is a region in which Q2 is larger than Q3 (the red region 
in panel b), while Q3 — Q2 grows as di/ (2M) and ^2/ (2M) tends to 1/2. The standard deviation (panel c) is maximal for an 
homogeneous split of the network [i.e., di/ (2M) = ^2/ (2M) = 1/3] and regularly decreases to zero as one move far from the 
homogeneous split. Here we set M = 100. 



cal significance of the partitions of equivalent real net- 
works [31, 36 . Given the degree sequence of the commu- 
nities {da}, Equation (15) provides the computation of 
the probability distribution of the modularity function 
T^c {Q \ {da})' In order to consider the different parti- 
tions of a graph, we need to obtained the unconditional 
probability Vc {Q) (only conditioned to the node degree 
sequence). This probability can be obtained from the 
convolution 



{dc} 



(16) 



where Vc {{da}) depends also on the degree sequence of 
the nodes in the network (i.e., {ki}). The computation 
of this probability is very expensive and we have done it 
only for C = 2. In this case, the number of partitions in 
which one of the groups has degree di can be obtained 
as 



G2{{dl}) 



En 

{rik} k 



nk 



(17) 



where indicates the number of nodes with degree k 
present in the network and Uk the number of vertices 
with k connections belonging to the group. Their sum is 
subjected to the constraints 



and 



di = ^ kuk . 



(18) 



The resulting probability can be calculated as 

p2({rfl})=a2({dl})/2^. 

We consider next, as examples, three social networks: 
the unweighted and weighted version of the Zachary 
Karate Club ^ and the friendship network between 
Dolphins [48]. In Figure [Ij we plot the cumulative dis- 
tribution of Q for the configurational model graphs ob- 
tained with these networks nodes' degree sequences. As 



the main plot shows, the distribution of Q depends on 
the original network (that is, on the particular nodes' 
degree sequence). The inset (a) of the figure shows that 
the conditional distribution of Q for different values of 
di (i.e., they have same average value, but different stan- 
dard deviation) differs and that the resulting uncondi- 
tional V2 {Q) strongly depends on the shape of V2 {{di}) 
and therefore on the degree sequence (see Figure ^p)- 
The modularity calculated for the original bi-partitions 
of these networks is high when compared with the typ- 
ical values observed for the bi-partitions of the equiva- 
lent graphs generated by the configurational model. The 
modularities found for the partitions of the real networks 
are: Qreai = 0.37469 for the unweighted version of the 
Zachary Karate Club, Qreai = 0.395959 for the weighted 
version of the same network and Qreai ~ 

0.374779 for 

the Dolphins social network. In all these three cases, the 
probability of finding such values among all the partitions 
of the equivalent configurational-model random graphs is 
quite low. Still this method to evaluate a partition sig- 
nificance presents a bias. Since all the possible partitions 
are considered for Vc (Q), even those with low modular- 
ity and disconnected groups, the partitions found by a 
modularity optimization algorithm will tend to be gen- 
erally dubbed as "unlike". A possible solution, in the 
spirit of our recent work [36], is to restrict the sum in 
Equation (16) to a suitable subset of partitions. An ex- 



ample can be the partitions that are local maxima in the 
Qc landscape when the random graphs generated by ap- 
plying the configurational model to the given network are 
analyzed. This, however, involves a systematic search for 
such maxima that goes beyond the scope of this paper. 
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Figure 4: (Color online) Cumulative distribution function of 
modularity for bipartitions 7^2 (> Q), calculated for three real 
networks: Zachary Karate Club, unweighted (thick black line) 
and weighted (thin red curve), and Dolphins social network 
(dotted blue line). In the inset (a), we plot V2 (Q |{<^i}), only 
for the unweighted version of the Zachary Karate Club, for 
di = 28 (black circles) and di = 78 (red squares). In the 
inset (b), we report V2 {{di}) — G2 {{di}) for the same 
networks. 



V. DIRECTED AND BIPARTITE NETWORKS 

Our combinatorial approach can be easily extended to 
directed and bipartite networks. In these cases, one needs 
to distinguish two classes of nodes (bi-partite) or connec- 
tions (directed). In the new null model (i.e., the exten- 
sion of the configurational model), one needs to reflect 
this distinction and construct simultaneously two differ- 
ent lists of labels. 

We start with the directed networks. Fixing C groups 
means defining two degree sequences {d^f^} and {(i^^^}, 
corresponding to the sequences of in-coming and out- 
going connections, respectively. In analogy with Eq. (|6]), 
each number appearing in these sequences is represented 
by the sum of the in- and out-degrees of all nodes be- 
longing to a given group. The total number of possible 
label sequences that can be formed is 



'C 



M! 



M! 



• • • d}^\ df^M • • • d^'f^H 



with constraints given by 



= E 



^out 



(19) 



= M. 



Eq. ( 19 ) is the product of the total number of lists of com- 



munity labels that can be constructed for the in-coming 
and out-going stubs, respectively. The total number 
of lists of community labels that satisfy the constraints 



C C 



7^S^^({e.,«,e«,^}) = M! 1111^ 



^=1 



(20) 



which is the analogous of Eq. Q, but corrected in this 
case for the absence of symmetry (i.e., it may happen 



that e^^o 7^ e6>,(^). The probability to observe a config- 
uration with intra- and inter-community connectivities 
given by {e^^ai^a,^} is again the ratio 7^^*^/7^*^, while 
the marginal distribution for the only intra-community 
connections {e^^a} can be calculated by summing over 
all values of the inter-community arcs subjected to the 
constraints = eg^^^ and d^^^ = e^p^g. As in the 
case of undirected networks, for C = 2 and C = 3 no 
sum is effectively required and the computation of the 
marginal probabilities is straightforward. For C = 2 for 
example, we obtain 



df\ {M-d\ 



'in \ I 



(21) 



' M\ ei,i! (4^ -ei,i)! 

dfH (M-^r^)! 
^ (dj^^ - ei,i)! (M - df - dl""^ + ei,i)! 



with average (ei,i) = dY'd^^'^/M. Eq. (21) can be used 



directly for the computation of the probability distribu- 
tion of the modularity since, for directed networks, Qc is 



defined with an expression similar to the one in Eq. (13) 



for undirected networks (only the term for the expected 
value of internal links in the null model changes) [25 . 

A similar procedure also applies to bipartite networks. 
In this case nodes are distinguished in two classes and 
only vertices belonging to different classes can be con- 
nected. The equations valid for the case of directed 
networks can be directly applied to bipartite networks. 
There are two different definitions of modularity for bi- 
partite networks. In the definition of Barber [49], mod- 
ules can be constructed by nodes of both classes and 
therefore the probability distribution of the modularity 
can be calculated directly from the previous equations. 
The definition of Guimera et al. [50] differently requires 
that modules are composed only of vertices of the same 
type. Our equations need to be modified and in particu- 
lar Eqs. (19) and (20) should take into account explicitly 
the presence of Ci and C2 groups with different type of 
nodes instead of only C modules. 



VI. SUMMARY AND CONCLUSIONS 

The study of the community structure of networks has 
attracted much attention during last years. Most of the 
work performed in this field of research has focused on 
the so-called modularity function, which has become a 
standard in this context with widespread usage in many 
different disciplines. Modularity has the nice character- 
istics of abstracting into a single number the strength 
and significance of the whole community structure of 
a network. Modularity is based on the comparison of 
the level of internal links in a given graph partition and 
the expected value of this quantity in the configurational 
model. This model generates the ensemble of all uncorre- 
lated networks compatible with the one under study and 
therefore constitutes a good term of comparison for the 
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evaluation of correlations as those at the basis of the ex- 
istence of communities. In this paper, we study the mod- 
ularity via complete enumeration of the partitions of the 
networks generated by the configurational model. Our 
combinatorial approach allows to formulate exact calcu- 
lations in the framework of the null model and therefore 
write an equation for the probability distribution func- 
tion of the modularity. Thanks to this, we are able to 
study several interesting features of modularity. We fo- 
cus on the so-called resolution limit of modularity, which 
is statistically observable in the best partitions of the 
configurational model, and on the properties of the top 
ranking instances of the modularity that can be related 
to the local maxima in the Qc landscape. We addition- 
ally study an estimator of the statistical significance of 



partitions in networks by measuring how probable is the 
possibility to observe a particular value of the modular- 
ity in the configurational model. Although as warned in 
the text, this technique is better applied in a distribu- 
tion of Qc restricted to a smaller, more selective, set of 
partitions. 
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